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Abstract 

This paper studies the influence of perturbations of conjugate priors in Bayesian inference. 

A perturbed prior is defined inside a larger family, local mixture models, and the effect on pos¬ 
terior inference is studied. The perturbation, in some sense, generalizes the linear perturbation 
studied in Gustafson ( 1996| l. It is intuitive, naturally normalized and is flexible for statistical 
applications. Both global and local sensitivity analyses are considered. A geometric approach 
is employed for optimizing the sensitivity direction function, the difference between posterior 
means and the divergence function between posterior predictive models. All the sensitivity 
measure functions are defined on a convex space with non-trivial boundary which is shown to 
be a smooth manifold. 

Keywords: Bayesian sensitivity; Local mixture model; Perturbation space; Newton’s method; 

Smooth manifold. 


1 Introduction 


Statistical analyses are often performed under certain assumptions which are not directly vali¬ 
dated. Hence, there is always interest in investigating the degree to which a statistical inference 
is sensitive to perturbations of the model and data. Specifically, in a Bayesian analysis for which 
conjugate priors have been chosen the sensitivity of the posterior to prior choice is an important 
issue. A rich literature on sensitivity to perturbations of data, prior and sampling distribution 


exists in 

Cook (19861, Mcculloch[( 19891, Lavine (19911, Ruggeri and Wasserman (19931, Blyth 

(19941, 

Gustafson 

(1996l, Critchley and Marriott 

(2004i, Linde (2007 

1 and Zhu, Ibrahim, and 

Tang ([2011||. 


Sensitivity analysis with respect to a perturbation of the prior, which is the focus of this paper, 
is commonly called robustness analysis. A comprehensive literature and review of existing meth¬ 
ods can be found in Insua and Ruggeri (2000|). In robustness analysis it is customary to choose 


a base prior model and a plausible class of perturbations. The influence of a perturbation is as¬ 
sessed either locally, or globally, by measuring the divergence of certain features of the posterior 
distribution. For instance, [Gustafson ( 1996|l studies linear and non-linear model perturbations, 


and [Weiss (19961 uses a multiplicative perturbation to the base prior and specifies the important 


perturbations using the posterior density of the parameter of interest. Common global measures of 
influence include divergence functions ( Weiss[ [1996 1 and relative sensitivity (Ruggeri and Siva- 
ganesan[ 20001. Note that any analysis highly depend on the selected influence measure, see in 
particular [Sivaganesan[(|2000[). 


In local analysis, the rate at which a posterior quantity changes, relative to the prior, quantifies 
sensitivity ( Gustafson] 1996 Linde[ 2007 [ Berger et al. 20001. Gustafson (19961, which we follow 
closely, obtains the direction in which a certain posterior expectation has the maximum sensitivity 
to prior perturbation by considering a mapping from the space of perturbations to the space of 
posterior expectations. In Linde (2007|, the Kullback-Leibler and divergence functions are 


utilized for assessing local sensitivity with respect to a multiplicative perturbation of the base 
prior or likelihood model. They approximate the local sensitivity using the Fisher information of 
the mixing parameter in additive and geometric mixing. 
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In this paper we consider both local, and global, sensitivity analyses with respect to pertur¬ 
bations of a conjugate base prior. We aim for three important properties for our method. Firstly, 
a well-defined perturbation space whose structure is such that it allows the analyst to select the 
generality of the perturbation in a clear way. Secondly, we want the space to be tractable, hence 
we look at convex sets inside linear spaces. Finally we want, in order to allow for meaningful 
comparisons, the space to be consistent with elicited prior knowledge. Thus if a subject matter ex¬ 
pert indicates a prior moment or quantile has a known value - or if a constraint such as symmetry 
is appropriate - then all perturbed priors should be consistent with respect to this information. 

Such an approach to defining the perturbation space extends the linear perturbations studied in 


Gustafson (19961 in all three ways. We do not require the same positivity condition, rather use one 
which is more general and returns naturally normalized distributions. Further, our space is highly 
tractable, due to intrinsic linearity and convexity. Finally it is clear, with our formulation, how to 
remain consistent with prior information which may have been elicited from an expert. The cost 
associated with this generalisation is the boundary defined in by Q in 12.1 and the methods we 
have developed to work with it. We also can compare our method with the geometric approach of 


Zhu et al. (20111 which uses a manifold based approach. Our, more linear, approach considerably 


improves interpretability and tractability while sharing an underlying geometric foundation. 

In the examples of this paper we work with our perturbation space in three ways. Similarly to 
Gustafson (19961 and Zhu et al. ( 2011| l in Example 1 we look for the worst possible perturbation, 
both locally and globally. In Example 2 we add constraints to the perturbation space, representing 
prior knowledge, and again look for maximally bad local and global perturbations. Einally, in 
Example 3, we marginalise over the perturbation space - rather than optimising over it - as a way 
of dealing with the uncertainty of the prior. 

The paper is organized as follows. In Section the perturbation space is introduced and 
its properties are studied. Sections and develop the theory of local and global sensitivity 
analysis. Section describes the geometry of the perturbation parameter space and proposes 
possible algorithms for quantifying local and global sensitivity. In Section we examine three 
examples. The proofs are sketched in Appendix. 


2 Perturbation Space 

2.1 Theory and Geometry 


Anaya-Izquierdo and Marriott 

200' 

h. Eor more details about convex and differential geometry 

see Berger (19871 and 

Amari 

(1990) 



Definition 1 For the family of mean parameterized models f{x; 6) the perturbation space is de¬ 
fined by the family of models f(x; 6, A) such that, 

(i) f{x;e,0) = f{x]e)forall 6. 

(ii) f{x] 9o, A) — f{x] 6 q) is Fisher orthogonal to the score of f{x; 9) at 9o. 


(Hi) For fixed 9 the f{x; 9o, A) space is affine in the mixture (—1) affine geometry defined in Mar- 
\riott\ [2002] 


A natural way to implement Definition is to extend the family /(x; 9) by attaching to it, at 
each 9q, the subfamily f{x-,9o,X), which is finite dimensional and spanned by a set of linearly 
independent functions Vj{x; 9q), j = I, ■■■, k, all Eisher orthogonal to the score of /(x; 9) at 9o. 
Thus, the subfamily /(x; 9q, A) can be defined as the linear space /(x; 9o) Xjvj (x; 9q), where 
Xj is a component of the vector A. Eor /(x; 9q, A) to be a naturally normalized density, we need 
two further restrictions: (i) f Vjdx = 0, and (ii) the A parameters must be restricted such that each 
subfamily is non-negative for all x. This defines the parameter space as 

^eo = {a I fixffio) + E XjVj{x]9o) > 0, for all x| . (1) 
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Note the space C R^, is an intersection of half-spaces and consequently is convex (Berger 
TW7l Ch.ll). 


Clearly, to construct such a perturbation space, the functions Uj must be selected. A particular 
form of Definition[T] with naturally specified Vj ’s is the family of local mixture models. This family 
is introduced in Marriott (2002 1 as an asymptotic approximation to a subspace of continuous 


mixture models with small mixing variation relatively to the total variation. Because of this small, 
or local, assumption, all perturbations are, in some sense, close to the baseline prior, and so any 
correspondingly large changes in the posterior will be of interest, as we show in the examples. 

Definition 2 The local mixture of a regular exponential family f[x]9) of order k via its mean 
parameterization, 9, is defined as 


k-l 


h{x- A, 9) = fix-, 9) + A 2 f^^\x; 0) + • • • + A^ f^’^Hx-, 9), X G Ae C R 
where A = (A 2 , ■■■ ,Xk) G Ae and f^^^x-, 9) = ^f{x-, 9), {j = !,■■■ , k). Also, Ag, for 


( 2 ) 


any 


fixed and known 9, is a convex space defined by a set of supporting hyperplanes. 
For regular exponential family f f^^\x-, 9 q) dx = 0, and as shown in 


Morris 


(1982), for natural 


exponential family, f^^\x-, h^fs are linearly independent and all Fisher orthogonal to the score 
function at 9q. This family is identifiable in all parameters, behaves locally similar fo genuine 
mixfure models, yef if is richer in fhe sense fhaf compared fo a regular density funcfion wifh fhe 
same mean fhey can also produce smaller variance. Furfher properfies of fhese models are sfudied 


in Anaya-Izquierdo and Marriott (20071. 


2.2 Prior Perturbation 

Suppose the base prior model is ttq ip-,9), the probability (density) function of a natural exponential 
family with the hyper-parameter 9. 

Definition 3 The perturbed prior model corresponding to vro(/r; 9) is defined by 

U)i 


-n-ip,X-,9) := 7roip-,9) 


'KQip-,9) |l -h , Xg Ag 


where X = (A 2 , A 3 , • • • , Xk) is the perturbation parameter vector, and Qjip, 9) = 
polynomials of degree j. 

In Definition ([^, ttq is perturbed linearly, similar to the linear perturbation 

r(-,7ro,ri*) = 7ro(-)+ u*(-), u*(-) > 0 


( 3 ) 


are 


( 4 ) 


studied in Gustafson] ( 1996 1 , but with a different positivity condition, and is, as we shall show, very 
interpretable for applications. Definition ([^ can also be seen as the multiplicative perturbation 
model Trip, A; 9) = vro(//, ; 9) h*ip-, X, 9) studied in [Linde (20071. 

As shown in Anaya-Izquierdo and Marriott ( 2007| ), the base and perturbed models share the 
same mean 0 ; however, the perturbation is implemented through changing the higher order mo¬ 
ments by adding linear combinations of A. This fact grantees the properties mentions in Section 

[II 


3 Local Sensitivity 

In this section we study the influence of local perturbations, defined inside the perturbation space, 
on the posterior mean. Similar to Gustafson ( [1996 1 we obtain the direction of sensitivity using 
the Frechet derivative of a mapping between two normed spaces. Throughout the rest of the 
paper we denote the sampling density and base prior by fix; p) and 7 ro(/i; 9), respectively, and 
X = (xi, • • • ,Xn) represents the vector of observations. 
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( 5 ) 


Lemma 1 Under the prior perturbation Q, the perturbed posterior model is 

, , ^ 7r°(u|x,0) f ^ 1 

7rp{p,X\x;9) = y+ , X G Ag 

with ^{X,6) = 1 + ^)] > '^here TTp{p\x,6) and Ep{-\ x) are the posterior 

density and posterior mean of the base model. 

The following lemma characterizes the moment of the perturbed posterior model. Note that, 
throughout the rest of the paper, for simplicity of exposition, we suppress the explicit dependence 
of qj, TTp and tt^ on 9. 

Lemma 2 The moments of the perturbed posterior distribution are given by 

Ep{p!'\x,X) = i XGAg. ( 6 ) 


where AUx) = E^{p} qj{p)\x). 


To qua ntify the magnitude of perturbation we exploit the size function as defined in Gustafson 
(1996 1 , i.e., the norm of the ratio for p < oo, with respect to the induced measure by ttq. 
Accordingly, the size function for «(•) is 


sizem = 


E. 




2^,=2 


which, (i) is a finite norm and (ii) is invariant with respect to change of the dominating measure 
and also with respect to any one-to-one transformation on the sample space. Clearly, size(n) 
is finite if the first k + p moments of 7 ro(/r, 9) exist. In addition, property (ii) holds by use of 
change of variable formula and the fact that for any one-to-one transformation m = v{p) we have 
TlQ\m,9)/TtQ{m,9) = ■K^^\p,9)/-Ko{p,9). 

For a mapping T : U ^ V, where U and V are, respectively, the perturbations space normed 
with size(-), and the space of posterior expectations normed with absolute value, the Frechet 
derivative at uq GU is defined by fhe linear funcfional T{uo) : —)• V salisfying 

\\T{uo + u) - T{uo) - t{uo)u\\v = o(||u||w), 

in which T{uo)u is fhe rafe of change of T af uq in direcfion u. Lef CoVpf, •) be fhe posferior 
covariance wifh respecf fo fhe base model. Theorem [^expresses T{uo)u as a linear funclion of A, 
af Mo = 0 which corresponds fo fhe base prior model. 

Theorem 1 T{0)u is a linear function of X as 

tW = A G Ae. (7) 


4 Global sensitivity 

Here we use fwo commonly applied measures of sensifivify - fhe posferior mean difference and 
Kullback-Leibler divergence funcfion - for assessing fhe global influence of prior perfurbafion on 
posferior mean and predicfion, respecfively. The following fheorem characterizes fhe difference 
befween fhe posferior mean of fhe base and perfurbed models as a funcfion of A. 

Theorem 2 Let ^'(A) = Ep{p\x,X) — Ep{p\x) represents the difference between the posterior 
expectations, then 

^(A) = ^¥^(A), AgA0. (8) 
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The function in ([^ behaves in a intuitively natural way, for as A —)• 0 we have ^(A) —)• 1, and 
consequently ^'(A) behaves locally in a similar way to (p{X). 

To assess the influence of the prior perturbation on prediction, we quantify the change in the 
divergence in the posterior predictive distribution. 

As a illustrative example, suppose the sampling distribution and the base prior model are 
respectively AA(//, cj^) and M{ 6 , ctq). The posterior predictive distribution for the base model is 
AA(/i7r, crl + CJ^), where 

Oa'^ + na^x 2 

f^7T 2 I 9 ’ 2 I 9 ■ 

naQ + 


Lemma 3 The posterior predictive distribution for the perturbed model is 

9p{y) = ^ |5p(y) + r E*[qj{p)] | (9) 

inwhich, gp{y) is the posterior predictive density for the base model, T is a function of {y,x,n,9o,aQ,a‘^) 
and E*{-) is expectation with respect to a normal distribution. 

For probability measures Pq and Pi with the same support space, S, and densities gp{-) and fl'p(-), 
respectively, Kullback-Leibler divergence functional is defined by. 


Dkl{Po,Pi) = / log [gp{y)/gp{y)] gp{y)dy 
'j s 


which satisfies the following conditions (see Amari (1990 1 ), 

1 . Dkl{Poi Pi) ^ with equality if and only if Pq = Pi. 

2 . Dkl{Po, Pi) is invariant under any transformation of the sample space. 
Theorem 3 Kullback-Leibler divergence between Pp(-) and gp{-) as a function of X is 

Dkl{X) = [ log [gpiy)] gp{y)dy+ log[^{X)] 


( 10 ) 


log (^gpiy) + r >^jP*[QjiT)]^ gp{y)dy, XgAq (li) 


5 Estimating A 

To obtain the values of A which find the most sensitive local and global perturbations, as described 
in Section we apply an optimization approach to the functions Q, ([^ and ( [IT] ). (p{X) is a 
linear function of A on the space Ag which presents the directional derivative of the mapping T at 
A = 0. Thus, for obtaining the maximum direction of sensitivity, called the worst local sensitivity 


direction in Gustafson (19961, we need to maximize ^^(A) over all the possible directions at A = 0 
restricted by the boundary of Ag. However, 'I'(A) and Dkl{X) are smooth objective functions 
on the convex space Ag, for which we propose a suitable gradient based constraint optimization 
method that exploits the geometry of the parameter space. By Definition for a fixed known 9, 
the space Ag is a non-empty convex subspace in with its boundary obtained by the following 
infinite set of hyperplanes 


■H = {X 


_ U 

1 + 9i(F) = 0 ; 


Specifically, for the normal example with order A: = 4, is the infinite set of planes of the form 

( 12 ) 

where z = Lemma describes the boundary of Ag as a smooth manifold. 


Px{z) = i X 2 + (z^ - X 3 + (z^ - ^ + ^ ] X 4 + 1. 


0 
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Lemma 4 The boundary of Kq is a manifold (smooth surface) embedded in E? Euclidean space. 


In addition, the interior of Kq, which guarantees positivity of 7 r(/i, A; 9) for all p ^ R, can be char¬ 
acterized by the necessary and sufficient positivity conditions of general polynomials of degree 
four. The corresponding polynomial to Equation ( [T^ is a quartic with highest degree coefficient 
A 4 ; hence, the necessary positivity condition is A 4 > 0. Also the comprehensive necessary and 
sufficient conditions are given in Barnard and Child ( |1936 1 and Bandy ( |1966 ). Throughout the rest 
of the paper we let /c = 4, as it gives a perturbation space which is flexible enough for our analysis 
and if has been illusfrafed in Marrioff ( |2002 1 , fhrough examples, fhaf simply increasing fhe order 
of local mixfure models does nof significanlly increase flexibilify. However, all fhe resulfs and 
algorifhms can be generalized fo higher dimensions wifh possible generalization of fhe posifivify 
conditions on polynomials wifh higher degrees. 


Lemma 5 (p{X) attains its maximum value at the gradient direction Vp if it is feasible; otherwise, 
the maximum direction is the direction of the orthogonal projection ofVp onto the boundary plane 
corresponding to A 4 = 0. 


Dkl{X) and 4'(A) are smoofh functions which can achieve fheir maximum eifher in fhe infe¬ 
rior or on fhe smoofh boundary of Aq. Therefore, opfimizafion shall be implemenfed in fwo steps: 
searching fhe inferior using regular Newfon-Raphson algorifhm, and fhen searching fhe boundary 
using a generalized form of Newfon-Raphson algorifhm on smoofh manifolds, see Shub (1986 1 
and also Maroufy and Marrioff (20151. 


6 Examples 

We consider fhree examples, where fhe firsf fwo sfudy local and global sensifivify in fhe normal 
conjugate model using fhe opfimizafion approaches developed earlier fo address fhe questions in 
Seclion[^ In fhe lasf example, we address sensifivify analysis in finile mixfure models wifh inde- 
pendenf conjugate prior models for all paramefers of inferesf. Rafher fhan using an opfimizafion 
approach, for fhis example a Markov Chain Monfe Carlo mefhod is used and sensifivify of fhe pos¬ 
terior disfribufion of each parameter is assessed. For demonsfrafing fhe effecf of fhe perturbation 
obfained in each example we compare fhe posferior disfribufions before and after perturbation and 
also use fhe relafive difference befween fhe Bayes esfimafes defined by 

\E^p{p) - E^^{p)\ 

stdPpib) 

in which Ep{p) and Ep(p) are fhe Bayes esfimafes wifh respecf fo fhe base and perfurbed models, 
respecfively, and stdp{p) is fhe posterior sfandard deviafion under fhe base model. Since 'I'(A) 
also allows negafive values, care musf be faken as we may need fo minimize fhis funcfion instead 
of maximizing if for achieving fhe maximum discrepancy befween fhe posferior disfribufions. 

Example 1 (Normal conjugate) A sample of size n = 15 is taken from A/’(l, 1), and the base 
prior is M(2,1). The estimate \d = (1.821,—0.014,0.482) anr/Aif = (1.817,-0.009,0.486) 
are obtained from maximizing Dkl{X) and minimizing 'k(A), respectively. The corresponding 
relative discrepancies in Bayes estimate are respectively d = 1.19,1.2; that is, the resulted biases 
in estimating posterior expectation are respectively 119% and 120% of the posterior standard 
deviation of the base model. Also, the corresponding posterior distributions are plotted in Figure 
Considering the fact that we construct the perturbation space as a family of local mixture 
models which are close to the base prior model, these maximum global perturbations are obtained 
by searching over a reasonably small space of prior distributions which only different from the 
base prior by their tail behaviour. Hence, these results imply that although conjugate priors are 
convenient in applications, they might cause significant bias in estimation as a result of even 
plausibly small prior perturbations. 
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(a) 


(b) 


(c) 




Figure 1: (a) sample, (b) and (c) posterior densities of based models and perturbed model (dashed) 
corresponding to Ad and respectively. 


For local analysis, we obtained the unit vector which maximizes the directional derivative 
Figure^presents the posterior density displacement by perturbation parameter = otX^ 
for different values of a > 0, as well as the boundary point Xb in direction of Xip. The correspond¬ 
ing relative differences in posterior expectation are d = 0.1, 0.16, 0.25, 0.38, 0.49, 0.56. Hence, 
additional to obtaining the worst direction, these values suggest that how far one can perturb the 
base prior along the worst direction so that relative discrepancy in posterior mean estimation is 
less than, say 50%. 



Figure 2: (a)-(e) posterior densities of based models and perturbed model (dashed) corresponding 
to A = aA<^ where a = 0.05,0.07,0.1,0.13,0.15 and (f) for boundary point in direction of A<^. 


Example 2 The central moments of the perturbed prior model, in Definition Q, are linearly 
related to the perturbation parameter A. Specifically, for the normal model we can check that 

/2(.2) = ^2 + 2A2 , Ai^)=6A3, =/iij + 12a2A2 + 24A4 (13) 

_(j\ 

where pf represents the jth central moment with respect to density tt. Clearly, X 2 modifies 
variance, A 3 adds skewness, and A 4 adjusts the tails. 

Suppose that elicited prior knowledge tells us that the perturbed prior is required to stay 
symmetric, then the perturbation space must be modified by the extra restriction A 3 = 0, which 
gives zero skewness. Consequently, we should be exploring the restricted space, say A^, instead 
of Kq, for the worst direction and maximum global perturbation. is a 2 -dimensional cross 
section obtained from intersection of Aq with the plane defined by A 3 = 0. Hence the bound¬ 
ary properties are preserved. For the same data in Example sensitivity in the worst direc¬ 
tion returns d = 0.1,0.16,0.26,0.42,0.57,0.64 (Figure^. Also, minimizing \I'(A)|;^ 3 =o returns 
A^ = (1.837,0.494). 

Two observations can be made from these results. First, as in Example although we have 
restricted the perturbation space further, there are still noticeable discrepancies in posterior den¬ 
sities caused by perturbation along the worst direction. Second, the results agree with that in 
Example^ where the estimate of X 3 does not seem to be significantly different from zero, and the 
rest of two parameter estimates are quite close in both examples. 
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(a) 


(b) 


(b) 




Figure 3: (a)-(e) posterior densities of based models and perturbed model (dashed) corresponding 
to A = aXtp where a = 0.05,0.07,0.1,0.13,0.15 and (f) for boundary point in direction of A<^. 

Example 3 (Finite Mixture) Using a missing value formulation, the likelihood function of the 
mixture model pj\f{x] + (1 — p) AA(x; /X 2 , (T 2 ) can be written as follows 

_ 2 

^= n,=i p"' 

where Aj = {i\wi = j}, and Wi is the latent missing variable for Xi such that p{wi = 1) = p, 
and p{wi = 2) = 1 — p. The marginal conjugate base prior models are pj J^{0j,croj), 
~ T{kj,Tj), and p ~ Beta{a, (3), {j = 1, 2). 

In this example the base prior model can be split into five independent components and, corre¬ 
spondingly, five independent perturbation spaces are naturally defined. Unlike previous examples, 
where we find the maximum local and global perturbations, here we explore each marginalized 
perturbation space by generating perturbation parameters and observing their influence on the 
posterior model of parameters of interest. 

Specifically, we use Markov Chain Monte Carlo Gibbs sampling for estimating the marginal 
posterior distribution of all parameters of interest, corresponding to the base and perturbed mod¬ 
els. Each perturbation parameter is generated, independently from the rest, through a Metropo¬ 
lis algorithm with a uniform proposal distribution. Figure^shows the histograms of generated 
sample for an observed data set of size n = lb from 0.4AA(x; —1,1) + 0.6AA(x; 1,1), and the 
hyper-parameters are set to be 0i = —1.5, 02 = 0.5, ri = T 2 = 1, k = 2 and a = = 1. 

Comparing the two histograms for each parameter, the posterior models for pi and p show sig¬ 
nificant differences between the base and perturbed models. The marginal relative differences are 
d = 0.49, 0.11, 0.40, 0.59, 0.71, respectively for (p, pi, P 2 , <7i, cr 2 )- These differences are not as 
significant as those in the previous examples for since they do not correspond to maximum per¬ 
turbations; instead, they return the average influences over all generated perturbation parameter 
values. 

The examples of this paper have explored the perturbation space in three ways. In Example 
1 we look for the worst possible perturbation, both locally and globally. In Example 2 we add 
constraints to the perturbation space, representing prior knowledge, and again look for maximally 
bad local and global perturbations. Einally, in Example 3, we marginalise over the perturbation 
space - rather than optimising over it - as a way of dealing with the uncertainty of the prior. 
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Appendix 

Proof 1 (Lemma [I]l 


7rp{p\x,X) = 


A)/(x;^) 

g{x,X) 


(14) 


where 


g{x,X) = / 7 r{p,X]0)f{x;p)dp 


f{x-,p)7ro{p-,9)dp + '^,^^Xj I qj{p,e)f{x-p)7ro{p,-,9)dp 


9ix) |l + 


(15) 


Since /(x; ^) 7 ro(/r; 0) = g{x)7rp{p\x,9) and g{x) = f /(x;/r) 7 ro(^; 0) d/r where, g{x) is the 
marginal density of sample in the base model. Hence, 


'Kp{p,X\x-,9) = 


/(x; p)ttq{p-, d) |l + Ej =2 ^3 9j{b, 6')} 

aix) {l + Ej=2 ^3 ^)]} 

tt^(u\x, 9) f 


^{x,9) 

with pA, 6 ») = 1 + Ylj =2 Xj E^lqjib, 6 ')]- 

Also ^{X,9) > 0, since h*{p;X,9) > 0, for all p € M. and X G Ag, and ^{X,9) = 
E^^{h*{p-X,9)). 

Proof 2 (Lemma Result follows by direct calculation and using the fact that, 

A]{x) := I p^Qjip) Tt^stiblx) dp = E°[p^qj{p)] (16) 


Proof 3 (Theorem [I]l Substitute u*{-) by u{-) in i !{Gustafs^ jI996j Result 8). 
Proof 4 (Theorem]^ By direct calculation and use of equation (|i 6 |) 
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Proof 5 (Lemma 1^ 


9piy) 


j /(y;/^)7rp(^,A|x)(i/i 


(17) 


is the convolution ofJ\f{tJ,, cr^) and (T^). Since, 


{y- 

^2 


{y - y-^f 


ot 


y 


<72+<72 


+ 




{y - y-wf 

a‘^ + al 


hence, the posterior predictive distribution for base model is Af{pTr, and Q is obtained 

by direct calculation, where. 


r = 


1 

\/ 2 vr(o -2 + cr 2 ) 



{y-y-^f \ 

2 (^x 2 +a 2 )/ 


and E*{-) is expectation with respect to p according to the following normal distribution 

f aly + ala"^ \ 

Proof 6 (Theorem]^ Use ofLemma^and direct calculation finishes the proof 


Proof 7 (Lemma Let ao 

P\{z) = 0 and P'^iz) = 
for the boundary as follows 


= 1 in equation (12) for convenience and fix A 4 . From solving 
0, simultaneously for A 2 and A 3 , we get a smooth parametrization 


hiz) 


A4 jP- 
2 zfl-( 


- 324 + 922 + 9 )- 322+3 

2^+3 

2 ^—22^+3)A4] 

24+3 


(18) 


Hence, by implicit function theorem Ijfiudin 1976 p.225) the boundary of is a smooth surface 
(Manifold) embedded in P? by 


Si : RxU 

Si{z,X4) = [A 2 ( 2 ;,A 4 ), A 3 (z,A 4 ),A 4 ] (19) 

Proof 8 (Lemma 1^ Vip = ( 02 , 03 , 04 ), is a vector originated at \ = 0 , where aj = CoVp (p,qj(p)). 
If it is feasible then clearly gives the maximum direction. However, if it is not feasible then 04 < 0 
since the condition 04 > 0 is necessary for feasibility. Thus, the direction of the orthogonal projec¬ 
tion ofV(p onto the boundary plane corresponding to X 4 = 0 is the closest we get to a maximum 
and feasible direction. 
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